An Annotated Japanese Sign Language Corpus
نویسندگان
چکیده
Sign language is characterized by its interactivity and multimodality, which cause difficulties in data collection and annotation. To address these difficulties, we have developed a video-based Japanese sign language (JSL) corpus and a corpus tool for annotation and linguistic analysis. As the first step of linguistic annotation, we transcribed manual signs expressing lexical information as well as non-manual signs (NMSs) including head movements, facial actions, and posture that are used to express grammatical information. Our purpose is to extract grammatical rules from this corpus for the sign-language translation system underdevelopment. From this viewpoint, we will discuss methods for collecting elicited data, annotation required for grammatical analysis, as well as corpus tool required for annotation and grammatical analysis. As the result of annotating 2800 utterances, we confirmed that there are at least 50 kinds of NMSs in JSL, using head (seven kinds), jaw (six kinds), mouth (18 kinds), cheeks (one kind), eyebrows (four kinds), eyes (seven kinds), eye gaze (two kinds), bydy posture (five kinds). We use this corpus for designing and testing an algorithm and grammatical rules for the sign-language translation system underdevelopment.
منابع مشابه
Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures
Parsing is one of the important processes for natural language processing and, in general, a large-scale CFG is used to parse a wide variety of sentences. For many languages, a CFG is derived from a large-scale syntactically annotated corpus, and many parsing algorithms using CFGs have been proposed. However, we could not apply them to Japanese since a Japanese syntactically annotated corpus ha...
متن کاملTowards an Annotation of Syntactic Structure in the Swedish Sign Language Corpus
This paper describes on-going work on extending the annotation of the Swedish Sign Language Corpus (SSLC) with a level of syntactic structure. The basic annotation of SSLC in ELAN consists of six tiers: four for sign glosses (two tiers for each signer; one for each of a signer’s hands), and two for written Swedish translations (one for each signer). In an additional step by Östling et al. (2015...
متن کاملDesign and recording of Czech sign language corpus for automatic sign language recognition
We describe the design, recording and content of a Czech Sign Language database in this paper. The database is intended for training and testing of sign language recognition (SLR) systems. The UWB-06-SLR-A database contains video data of 15 signers recorded from 3 different views, two of them capture whole body and provide 3D motion data, and third one is focused on signer’s face and provide da...
متن کاملRWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus
This paper introduces the RWTH-PHOENIX-Weather corpus, a video-based, large vocabulary corpus of German Sign Language suitable for statistical sign language recognition and translation. In contrast to most available sign language data collections, the RWTH-PHOENIX-Weather corpus has not been recorded for linguistic research but for the use in statistical pattern recognition. The corpus contains...
متن کاملSpontaneous Speech Corpus of Japanese
Design issues of a spontaneous speech corpus is described. The corpus under compilation will contain 800-1000 hour spontaneously uttered Common Japanese speech and the morphologically annotated transcriptions. Also, segmental and intonation labeling will be provided for a subset of the corpus. The primary application domain of the corpus is speech recognition of spontaneous speech, but we plan ...
متن کامل